---
title: "Replication for: The Long-Run Effects of Religious Persecution: Evidence from the Spanish Inquisition"
subtitle: "GOV 2001"
author: "Noah Dasanaike and Luchuan Deng"
output: pdf_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

library(tidyverse)
library(mapSpain)
library(haven)
library(sf)
library(ggspatial)
library(ggthemes)
library(stargazer)
library(binsreg)
library(stringr)
```

## Abstract

This paper stands as a replication for Drelichman et al. (2020) on the relationship between the Spanish Inquisition and contemporary economic outcomes at the municipality level, holding religiosity and wealth while "ruling out potential selection bias." We successfully used the original data set provided by the authors to conduct the replication, and to discern possible omitted variable bias in the model specification, as well as the true mechanism that the Inquisition treatment might be actually capturing. After replicating all of the figures and tables from the original paper, we will later extend the analysis to include such covariates as geographical distance from regional centres of importance, population density, agricultural production and the industrial revolution (which we believe the Inquisition is merely a proxy of), and, possibly, the historical legacies of being occupied by the Ottoman Empire. With that being said, the component presented here is exclusively the replication of the original paper without extension. Note that all of the authors' original code was in Stata. We did not reference the authors' Stata code except to cross-validate their results in both our R replication and their code, as well as in reference for running the coarsened exact matching (CEM; to verify which covariates they match on).

Our RMD file is available here: https://github.com/noahdasanaike/2001_replication

```{r, include = FALSE}
data <- read_dta("Inquisition_analysis_dataset.dta")
munic <- esp_get_munic() %>% 
  mutate(name = str_replace_all(name, " / ", "/"))

merge <- data %>% 
  select(muni_name, adj_impact) %>%
  rename(name = muni_name) %>% 
  inner_join(munic %>% filter(!ine.ccaa.name %in% c("Canarias")) %>% select(name, geometry))
```

```{r}
province <- munic %>% 
  group_by(ine.ccaa.name) %>%
  filter(!ine.ccaa.name %in% c("Canarias")) %>% 
  summarize(geometry = st_union(geometry))
```

\newpage
## Figure 1

To begin, we gathered the data applied by the original paper and in order to show its basic features, we replicate Figure 1 of the paper. We plotted the distribution of Inquisition intensity across Spain at the municipality level and, from the figure, we see that the south, northeast and mid-west regions disproportionately experienced more of the Inquisition. The pattern generally follows the current economic distribution in Spain. One thing that needs to be clarified is a notable absence of--rather than coding of zero for--inquisition data for the northwest region of the country. These localities are shown in the legend interval (-Inf, 0]. This missing data in the explanatory variable primarily falls in the wealthier northwestern region of the country, which could be confounding the demonstrated relationship. These results perfectly replicated. Our figure is on the left, and the original is on the right.

\ 

```{r, warning = FALSE, message = FALSE, echo = FALSE, fig.show = "hold", out.width = "50%"}
#### FIGURE 1

figure_one <- merge %>%
  ggplot() +
  geom_sf(aes(geometry = geometry, fill = cut(adj_impact, 
                                              breaks = c(-Inf, 0, .025, .05, .075, .1, 1))), 
          lty = .1) +
  scale_fill_brewer("Inquisition Intensity", palette = "Blues") +
  geom_sf(data = province, alpha = 0, color = "black") +
   annotation_scale(location = "bl", width_hint = 0.5) +
    annotation_north_arrow(location = "bl", which_north = "true", pad_y = unit(.5, "in"),
        style = north_arrow_fancy_orienteering) +
  theme_map() +
  theme(legend.justification = c(1, 0), legend.position = c(1, 0),
        legend.text = element_text(size = 7),
        legend.key.size = unit(.5, 'cm'),
        legend.title = element_text(size = 9))

# ggsave(plot = figure_one, "figure1.png", device = "png")

knitr::include_graphics(path = c("figure1.png", "original/orig_figure1.png"))
```

\newpage
## Figure 2

We then went on to replicate Figure 2. This figure accomplishes two primary purposes: it first considers the relation between GDP per capita as estimated through nightlight density; secondly, it clusters the Spanish municipalities into 4 different groups based on the intensity of the Inquisition. The original paper used this figure as its first empirical evidence of the influence of inquisition on contemporary economic development. The results perfectly replicated. Our figure is on the left; the original figure is on the right.

\ 

```{r, warning = FALSE, message = FALSE, echo = FALSE, fig.show = "hold", out.width = "50%"}
#### FIGURE 2
zero_fig_two <- data %>% 
  filter(gdppc > 12000 & gdppc < 25000, adj_impact == 0) %>%
  mutate(tercile = "0")
  
figure_two <- data %>% 
  filter(gdppc > 12000 & gdppc < 25000, adj_impact > 0) %>%
  mutate(tercile = as.character(fabricatr::split_quantile(adj_impact, 3))) %>%
  bind_rows(zero_fig_two) %>%
  ggplot(aes(x = gdppc, linetype = tercile)) +
  geom_line(stat = "density") + 
  labs(x = "GDP p.c. in \u20ac", y = "Density") + 
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line(colour = "black"),
        legend.key = element_blank(),
        legend.position = "top",
        axis.text.x = element_text(color = "black"),
        axis.text.y = element_text(color = "black"),
        legend.text = element_text(size = 7),
        legend.key.size = unit(.5, 'cm'),
        legend.title = element_text(size = 9)) +
  xlim(10000, 25000) +
  scale_linetype_manual(labels = c("no impact/missing data",
                                  "lowest tercile, not zero",
                                  "middle tercile",
                                  "highest tercile"),
                        values = c("solid", "dashed", "dotted", "dotdash"),
                        name = "") +
  guides(linetype = guide_legend(nrow = 2 , byrow = TRUE)) + 
  geom_hline(yintercept = 0, colour = "white", size = 1.5) +
  scale_y_continuous(breaks = c(0, .0002, .0004),
                     limits = c(0, .0005),
                     labels = scales::label_number())

# ggsave(plot = figure_two, "figure2.png", device = "png")

knitr::include_graphics(path = c("figure2.png", "original/orig_figure2.png"))
```

\newpage
## Table 1

After the two figures, we controlled exactly the same variables as the paper did and replicated Table 1 in the paper.  We leave the covariates at their third decimal place, which is a little more accurate than in the original paper. Beyond this minor detail, our table replication produced the exact same results as in the original paper, and we can observe a significant negative relationship between inquisitorial intensity and modern-day economic outcomes at the municipality level. The coefficients, standard errors, observations, and $R^2$ are equivalent. Our table comes first; the original table proceeds it.

\ 


```{r, results = "asis", warning = FALSE, message = FALSE, echo = FALSE}
#### TABLE 1
table_one_a <- lm(log_gdppc ~ adj_impact + log_pop + latitude + longitude + rugged + 
                    dist_trib + dist_river + dist_road + dist_sea + civil_married + 
                    class_upper + age + as.factor(autonomia), data %>% filter(trib_head == 0)) 

table_one_b <- lm(religious ~ adj_impact + log_pop + latitude + longitude + rugged + 
                    dist_trib + dist_river + dist_road + dist_sea + civil_married + 
                    class_upper + age + as.factor(autonomia), data %>% filter(trib_head == 0)) 

table_one_c <- lm(c_secondplus ~ adj_impact + log_pop + latitude + longitude + rugged + 
                    dist_trib + dist_river + dist_road + dist_sea + civil_married + 
                    class_upper + age + as.factor(autonomia), data %>% filter(trib_head == 0)) 

table_one_d <- lm(trust2 ~ adj_impact + log_pop + latitude + longitude + rugged + 
                    dist_trib + dist_river + dist_road + dist_sea + civil_married + 
                    class_upper + age + as.factor(autonomia), data %>% filter(trib_head == 0)) 

star = stargazer(table_one_a, table_one_b, table_one_c, table_one_d, type = "latex", 
          header = FALSE, title = "",
          omit = "autonomia",
          dep.var.labels = c("log GDP p.c.", "religiosity", "education", "trust"),
          covariate.labels = c("inquisitorial intensity", "log population", "latitude",
                               "longitude", "ruggedness", "distance to tribunal",
                               "distance to river", "distance to road", "distance to sea",
                               "share married", "share upper class", "average age",
                               "constant"),
          add.lines = list(c("regional FE", "Yes", "Yes", "Yes", "Yes")),
          omit.stat = c("adj.rsq", "ser", "f"))

star = sub('^.+\\caption.+$','', star)

knitr::include_graphics(path = c("original/table1.png"))
```

\newpage
## Figure 3

Using the paper’s data on inquisition density and the density of pre-1480 religious figures, we plot their relationship in our replication of Figure 3. As can be observed, the trend is consistent with what the paper has been arguing: inquisition intensity is higher in places where the density of pre-1480 religious figures is lower. This plot perfectly replicated, and required running `binsreg` after manually calculating `20` bins; the original code was in Stata. Our plot is on the left; the original is on the right. The figure dimensions are slightly different but otherwise the figures are equivalent.

\ 

```{r, warning = FALSE, message = FALSE, echo = FALSE, fig.show = "hide"}
#### FIGURE 3
fig_3_data <- binsreg(data$adj_impact, data$log_den_rel_pre,
        nbins = 20)$data.plot[[1]][["data.dots"]]
```


```{r, warning = FALSE, message = FALSE, echo = FALSE, fig.show = "hold", out.width = "50%"}
figure_three <- fig_3_data %>% 
  ggplot(aes(x, fit)) +
  geom_point(shape = 1, color = "navy") + 
  geom_smooth(method = "lm", formula = y ~ x, se = FALSE,
              lty = "dashed", color = "coral3") +
  labs(x = "log of density of pre-1480 religious figures", 
       y = "inquisitorial intensity") + 
  theme(axis.line = element_line(colour = "black"),
        axis.text.x = element_text(color = "black"),
        axis.text.y = element_text(color = "black")) +
  scale_y_continuous(breaks = c(.01, .015, .02, .025, .03),
                     limits = c(.01, .03),
                     labels = scales::label_number()) +
  scale_x_continuous(breaks = c(-5:-1),
                     limits = c(-5, -1)) +
  theme_bw()

# ggsave(plot = figure_three, "figure3.png", device = "png")

knitr::include_graphics(path = c("figure3.png", "original/orig_figure3.png"))
```
\newpage

## Figure 4 

Next, we replicate the comparison map between the inquisitorial intensity and income per capita by municipality in the case of Tribunal of Seville for Figure 4.  The figure demonstrates the possible power of explanation using hospitals’ existence as dummy for the economic development level of municipalities in the 1800s. Our figure perfectly replicated theirs. Our set of two plots is first; the original proceeds it. 

\ 

```{r, warning = FALSE, message = FALSE, echo = FALSE, out.width = "80%"}
### Figure 4

inq_int_plot <- data %>% 
  select(muni_name, adj_impact, court_geo, hospitals) %>%
  rename(name = muni_name) %>% 
  inner_join(munic %>% filter(!ine.ccaa.name %in% c("Canarias")) 
             %>% select(name, geometry)) %>% 
  filter(court_geo == 6) %>%
  mutate(hospital_label = ifelse(hospitals == 0, "*", "H")) %>% 
  ggplot() +
  geom_sf(aes(geometry = geometry, fill = cut(adj_impact, 
                                              breaks = c(-Inf, 0, .025, .05, .075, .1, 1))), 
          size = .3,
          color = "black") +
  scale_fill_brewer("", palette = "Blues") +
  theme_map() +
  theme(legend.justification = c(0, 0), legend.position = c(0, 0),
        legend.key.size = unit(.6, 'cm'),
        plot.title.position = "plot") +
  scale_y_continuous(limits = c(35, 39)) +
  geom_sf_text((aes(geometry = geometry, label = hospital_label))) +
  labs(title = "Inquisitorial Intensity")

gdp_cap_plot <- data %>% 
  select(muni_name, gdppc, court_geo) %>%
  rename(name = muni_name) %>% 
  inner_join(munic %>% filter(!ine.ccaa.name %in% c("Canarias")) 
             %>% select(name, geometry)) %>% 
  filter(court_geo == 6) %>%
  mutate(gdp_quart = as.factor(ntile(gdppc, 4))) %>%
  ggplot() +
  geom_sf(aes(fill = gdp_quart, geometry = geometry), 
          size = .3,
          color = "black") +
  scale_fill_brewer("", palette = "Blues", labels = c("First quartile", 
                                                      "Second quartile",
                                                      "Third quartile",
                                                      "Fourth quartile")) +
  theme_map() +
  theme(legend.justification = c(0, 0), legend.position = c(0, 0),
        legend.key.size = unit(.6, 'cm'),
        plot.title.position = "plot") +
  scale_y_continuous(limits = c(35, 39)) +
  labs(title = "GDP per Capita")

gridExtra::grid.arrange(inq_int_plot, gdp_cap_plot, nrow = 1)
```

\ 

```{r, echo = FALSE, out.width = "80%"}
knitr::include_graphics(path = c("original/orig_figure4.png"))
```
\newpage 

## Table 2 

Finally, we replicate the use of the CEM model to test the accuracy of OLS regression, as has been applied by the original paper. In Table 2, one can observe that the covariates of the OLS and CEM model share similar value and positive/negative directionality. Their significances are similar as well. The OLS estimates perfectly match, as do their $R^2$ values. The CEM presented some difficulty as the original models were run in Stata, but the replication eventually succeeded. The significance and directionality of these estimates is largely retained, though there is a very minor difference for each (i.e., -.987 in our replication on trust, -.721 for theirs). This is actually not the case, however, for the religiosity outcome, for which the OLS and CEM estimates in our replication model are no longer statistically significant at $alpha = .1$. However, we have verified extensively that our models are correctly specified in alignment with the authors. That the $R^2$ of our OLS exactly matches that of the authors', and the standard error to a slightly lesser extent, for the religiosity model attests to this fact. This will require further exploration in the paper extension. The others estimates are much closer and the $R^2$ values are approximately equal. We ran the authors' Stata code to verify their own results, but present here our replication in R. The difference in observations between our replication and theirs is actually false; Stargazer is printing the total number of observations, rather than only those which are matched. This has no standing on the coefficient estimates-- manually printing the `matchit()` object reveals that the observation count is as expected. Our table comes first; the original proceeds it.

\ 

```{r, results = "asis", echo = FALSE, include = FALSE}
### Table 2
library(cem)
library(MatchIt)

table_two_a_ols <- lm(log_gdppc ~ adj_impact + log_pop + latitude + longitude + rugged + 
                    dist_trib + dist_river + dist_road + dist_sea + civil_married + 
                    class_upper + age + as.factor(autonomia) + hospitals, 
                  data %>% filter(trib_head == 0)) 

table_two_b_ols <- lm(religious ~ adj_impact + log_pop + latitude + longitude + rugged + 
                    dist_trib + dist_river + dist_road + dist_sea + civil_married + 
                    class_upper + age + as.factor(autonomia) + hospitals, 
                  data %>% filter(trib_head == 0)) 

table_two_c_ols <- lm(c_secondplus ~ adj_impact + log_pop + latitude + longitude + rugged + 
                    dist_trib + dist_river + dist_road + dist_sea + civil_married + 
                    class_upper + age + as.factor(autonomia) + hospitals, 
                  data %>% filter(trib_head == 0)) 

table_two_d_ols <- lm(trust2 ~ adj_impact + log_pop + latitude + longitude + rugged + 
                    dist_trib + dist_river + dist_road + dist_sea + civil_married + 
                    class_upper + age + as.factor(autonomia) + hospitals, 
                  data %>% filter(trib_head == 0))  

star_two_ols = stargazer(table_two_a_ols, table_two_b_ols, table_two_c_ols, table_two_d_ols, 
                         type = "latex",
                         title = "Panel A: OLS",
                         keep = ("adj_impact"),
                         dep.var.labels = c("log GDP p.c.", "religiosity", "education", "trust"),
                         covariate.labels = c("inquisitorial intensity"),
                         omit.stat = c("adj.rsq", "ser", "f"),
                         header = FALSE)


#####

master_cem_data <- data %>%
  filter(trib_head == 0) %>% 
  select(pop_padron, pop1528, latitude, longitude, rugged, dist_trib, dist_road, 
         dist_river, dist_sea,impacthigh, adj_impact, log_pop, civil_married,
         class_upper, age, autonomia, log_gdppc, religious, c_secondplus, trust2)

cem_data <- master_cem_data %>% 
  select(pop_padron, pop1528, latitude, longitude, rugged, dist_river, dist_sea,
         impacthigh)

cem_data <- data.frame(cem_data)
master_cem_data <- data.frame(master_cem_data)

table_two_cem <- cem(treatment = "impacthigh", data = cem_data) 

table_two_a_cem <- lm(log_gdppc ~ adj_impact + log_pop + latitude + longitude + rugged + 
                    dist_trib + dist_river + dist_road + dist_sea + civil_married +
                      class_upper + age + as.factor(autonomia), data = master_cem_data,
                    weights = table_two_cem$w)

table_two_b_cem <- lm(religious ~ adj_impact + log_pop + latitude + longitude + rugged + 
                    dist_trib + dist_river + dist_road + dist_sea + civil_married +
                      class_upper + age + as.factor(autonomia), data = master_cem_data, 
                      weight = table_two_cem$w)

table_two_c_cem <- lm(c_secondplus ~ adj_impact + log_pop + latitude + longitude + rugged + 
                    dist_trib + dist_river + dist_road + dist_sea + civil_married +
                      class_upper + age + as.factor(autonomia), data = master_cem_data, 
                      weight = table_two_cem$w) 

table_two_d_cem <- lm(trust2 ~ adj_impact + log_pop + latitude + longitude + rugged + 
                    dist_trib + dist_river + dist_road + dist_sea + civil_married +
                      class_upper + age + as.factor(autonomia), data = master_cem_data, 
                      weight = table_two_cem$w)

star_two_cem = stargazer(table_two_a_cem, table_two_b_cem, table_two_c_cem, table_two_d_cem, 
                         title = "Panel B: CEM",
                         keep = ("adj_impact"),
                         dep.var.labels = c("log GDP p.c.", "religiosity", "education", "trust"),
                         covariate.labels = c("inquisitorial intensity"),
                         omit.stat = c("adj.rsq", "ser", "f"),
                         type = "latex",
                         header = FALSE)
```


```{r, results = "asis", fig.show = "hold", out.width = "100%", echo = FALSE}
cat(star_two_ols)
cat(star_two_cem)

knitr::include_graphics(path = c("original/table2.png"))
```

\newpage 

This replication has inspired us to think further about our extension. The models appear properly specified and all replicated well. However, there is a very significant confounder that will be the focus of our analysis extension, and which we are currently searching for the respective data to measure. The industrial revolution was heavily concentrated in the northwest of Spain, while the Inquisition was concentrated along the primarily agricultural south and northeast regions of the country. Thus, the Inquisition effect as measured here may simply be a consequence of the industrial revolution benefiting the northwest region of the country disproportionately. Previous papers have focused on estimating this data and we are in the process of obtaining it from the authors.

Similarly, during 711 and 1492 the Muslim Ottoman regime controlled nearly all of Spain with its capital and hinterland located in Southern Spain, mainly in Andalusia. The history of being most severely influenced by Muslim regime may count for the fact that these places (southern Spain, as we can see in Figures 1 and 3) have lower level of religiosity as well as higher level of inquisition - the people, parish and religious influence left by previous regime should be thoroughly "cleansed". Therefore, we would like to include the historical data on the Muslim occupation in southern Spain to see whether this history of being conquered could actually explain the relation between inquisition intensity and the "low religious intensity." An ongoing research project measures this effect at the municipality level in Spain, and we have contacted the authors. 

Beyond these two major extensions, we have already manually calculated our proposed additional variables pertaining to population density and change over time. We are actively brainstorming other covariates that can improve the coefficient of determination. As such, the replication is coming along well. 

```{r}
prov_test <- data %>% 
    drop_na(adj_impact, area, log_pop, rugged, dist_trib, dist_river, dist_road, 
            dist_sea, civil_married, class_upper, age) %>% 
    group_by(autonomia) %>% 
    summarize(n = n())
```

```{r, include = FALSE}
#### EXTENSION
historical_regions <- readxl::read_xlsx("historical_provinces.xlsx") %>% 
  mutate(historical_capital_long = as.numeric(historical_capital_long))

historical_regions <- st_as_sf(historical_regions, coords = 
                                                         c("historical_capital_long", 
                                                           "historical_capital_lat"), crs = 4386) %>% 
  rename(historical_region_geom = geometry)

agriculture <- readxl::read_xls("agri.xls") %>%
  mutate(used_agricultural_area = lag(used_agricultural_area),
         total_livestock = lag(total_livestock),
         agricultural_holdings = lag(agricultural_holdings)) %>%
  rowwise() %>% 
  mutate(municipality = str_split(municipality, " ")[[1]][1]) %>% 
  head(-6) %>% 
  filter(municipality != "2009") %>% 
  mutate(municipality = as.numeric(municipality)) %>% 
  rename(preferred_inecode = municipality)

data_modified <- data %>% 
  left_join(historical_regions) %>% 
  left_join(agriculture) %>% 
  rename(name = muni_name) %>% 
  left_join(munic %>% select(ine.ccaa.name, name)) %>% 
  mutate(density = pop_padron / area) %>% 
  rename(communities = ine.ccaa.name)
```

```{r}
data_modified$geometry_transformed <- st_set_crs(data_modified$geometry, 4386)
# 
# data_modified$capital_dist <- NA
# data_modified$regional_city_dist <- NA
# capital <- st_as_sf(tibble(lat = c(40.416667), long = c(-3.716667)), coords = c("long", "lat"), crs = 4386)
# 
# for (i in 1:nrow(data_modified)){
#   data_modified$capital_dist[i] <- st_distance(data_modified$geometry_transformed[i],
#                         capital)
# 
#   data_modified$regional_city_dist[i] <- st_distance(data_modified$geometry_transformed[i],
#                           data_modified$historical_region_geom[i])
# }
# 
# city_distances <- data_modified %>%
#   select(munisid, regional_city_dist, capital_dist)
# 
# write.csv(city_distances, "city_distances.csv")

city_distances <- read_csv("city_distances.csv") %>%
  select(-c(X1, regional_city_dist)) %>% 
  group_by(munisid) %>% 
  summarize(capital_dist = min(capital_dist))

data_modified <- data_modified %>%
  left_join(city_distances, by = "munisid")
```

```{r}
# mosques <- readxl::read_xlsx("mosque locations.xlsx") %>% 
#   mutate(former_mosque_long = as.numeric(former_mosque_long))
# 
# mosques <- st_as_sf(mosques, coords = c("former_mosque_long", "former_mosque_lat"), crs = 4386) %>%
#   rename(historical_mosque_geom = geometry)
# 
# data_modified$historical_mosque_dist <- NA
# 
# for (i in 1:nrow(data_modified)){
#   distances <- st_distance(data_modified$geometry_transformed[i], mosques)
# 
#   data_modified$historical_mosque_dist[i] <- min(as.vector(distances))
# }
# 
# mosque_distances <- data_modified %>%
#   select(munisid, historical_mosque_dist)
# 
# write.csv(mosque_distances, "mosque_distances.csv")

mosque_distances <- read_csv("mosque_distances.csv") %>%
  select(-c(X1)) %>% 
  group_by(munisid) %>% 
  summarize(historical_mosque_dist = min(historical_mosque_dist))

data_modified <- data_modified %>%
  left_join(mosque_distances)
```

```{r}
# regional_centre <- data_modified %>%
#   select(name, geometry_transformed, division_1833, pop_padron) %>%
#   group_by(division_1833) %>%
#   summarize(regional_centre = geometry_transformed[which.max(pop_padron)])
# 
# data_modified <- left_join(data_modified, regional_centre %>% select(division_1833, regional_centre))
# 
# data_modified$regional_centre_dist <- NA
# 
# for (i in 1:nrow(data_modified)){
#   distance <- st_distance(data_modified$geometry_transformed[i],
#                            data_modified$regional_centre[i])
# 
#   data_modified$regional_centre_dist[i] <- max(as.vector(distance))
# }
# 
# regional_distances <- data_modified %>%
#   select(munisid, regional_centre_dist)
# 
# write.csv(regional_distances, "regional_distances.csv")

regional_distances <- read_csv("regional_distances.csv") %>%
  group_by(munisid) %>%
  summarize(regional_centre_dist = min(regional_centre_dist))

data_modified <- data_modified %>%
  left_join(regional_distances)
```

```{r}
# andorra <- st_as_sf(tibble(lat = c(42.5), long = c(1.516667)),
#                     coords = c("long", "lat"), crs = 4386)
# 
# data_modified$andorra_distance <- NA
# 
# for (i in 1:nrow(data_modified)){
#   distance <- st_distance(data_modified$geometry_transformed[i],
#                            andorra)
# 
#   data_modified$andorra_distance[i] <- max(as.vector(distance))
# }
# 
# andorra_distance <- data_modified %>%
#   select(munisid, andorra_distance)
# 
# write.csv(andorra_distance, "andorra_distance.csv")

andorra_distance <- read_csv("andorra_distance.csv") %>%
  group_by(munisid) %>%
  summarize(andorra_distance = min(andorra_distance))

data_modified <- data_modified %>%
  left_join(andorra_distance)
```

```{r}
# tangier <- st_as_sf(tibble(lat = c(35.776667), long = c(-5.803889)), coords = c("long", "lat"), crs = 4386)
# 
# data_modified$tangier_distance <- NA
# 
# for (i in 1:nrow(data_modified)){
#   distance <- st_distance(data_modified$geometry_transformed[i],
#                            tangier)
# 
#   data_modified$tangier_distance[i] <- max(as.vector(distance))
# }
# 
# tangier_distance <- data_modified %>%
#   select(munisid, tangier_distance)
# 
# write.csv(tangier_distance, "tangier_distance.csv")

tangier_distance <- read_csv("tangier_distance.csv") %>%
  group_by(munisid) %>%
  summarize(tangier_distance = min(tangier_distance))

data_modified <- data_modified %>%
  left_join(tangier_distance)
```

```{r}
data_modified$data_1528 <- ifelse(is.na(data_modified$pop1528), 0, 1)

data_modified$north_of_madrid <- NA

for (i in 1:nrow(data_modified)){
  data_modified$north_of_madrid[i] <- 
    ifelse(st_is_empty(data_modified$geometry_transformed[i]), NA,
           ifelse(mean(st_coordinates(data_modified$geometry_transformed[i])[,2]) > 40.4168, 1, 0))
}
```

```{r}
data_modified$regional_centre_dist_km <- data_modified$regional_centre_dist / 10000
data_modified$historical_mosque_dist_km <- data_modified$historical_mosque_dist / 10000
data_modified$capital_dist_km <- data_modified$capital_dist / 10000
data_modified$tangier_distance_km <- data_modified$tangier_distance / 10000
data_modified$andorra_distance_km <- data_modified$andorra_distance / 10000
```


```{r, results = "asis"}
table_one_a_modified <- lm(log_gdppc ~ adj_impact + log_pop + rugged + area + density + data_1528 +
             north_of_madrid + regional_centre_dist_km + historical_mosque_dist_km + capital_dist_km + 
             tangier_distance_km + andorra_distance_km + dist_trib + dist_river + 
             dist_road + dist_sea + civil_married + class_upper + age + 
           as.factor(division_1833), 
                    data_modified %>% filter(trib_head == 0))

table_one_b_modified <- lm(religious ~ adj_impact + log_pop + rugged + area + density + data_1528 +
             north_of_madrid + regional_centre_dist_km + historical_mosque_dist_km + capital_dist_km + 
             tangier_distance_km + andorra_distance_km +  + dist_trib + dist_river + 
             dist_road + dist_sea + civil_married + class_upper + age + 
           as.factor(division_1833), 
                    data_modified %>% filter(trib_head == 0))

table_one_c_modified <- lm(c_secondplus ~ adj_impact + log_pop + rugged + area + density +
             data_1528 + north_of_madrid + regional_centre_dist_km + historical_mosque_dist_km + capital_dist_km + 
             tangier_distance_km + andorra_distance_km + dist_trib + dist_river + 
             dist_road + dist_sea + civil_married + class_upper + age + 
           as.factor(division_1833), 
                    data_modified %>% filter(trib_head == 0))

table_one_d_modified <- lm(trust2 ~ adj_impact + log_pop + rugged + area + density + 
             data_1528 + north_of_madrid + regional_centre_dist_km + historical_mosque_dist_km + capital_dist_km + 
             tangier_distance_km + andorra_distance_km + dist_trib + dist_river + 
             dist_road + dist_sea + civil_married + class_upper + age + 
           as.factor(division_1833), 
                    data_modified %>% filter(trib_head == 0))

star_modified = stargazer(table_one_a_modified, 
                          table_one_b_modified, 
                          table_one_c_modified, 
                          table_one_d_modified, 
                         type = "latex",
                         title = "Panel A: OLS",
                         dep.var.labels = c("log GDP p.c.", "religiosity", "education", "trust"),
                         covariate.labels = c("inquisitorial intensity", 
                                              "area", "density",
                                              "municipality existed in 1528",
                                               "north of madrid",
                                              "distance to regional centre of importance",
                                              "distance to historical mosque",
                                              "distance to capital",
                                              "distance to tangier, morocco",
                                              "distance to andorra", "constant"),
                         keep = c("adj_impact", "area", "density", "data_1528",
                                  "north_of_madrid",
                                  "regional_centre_dist", "historical_mosque_dist",
                                  "capital_dist", "tangier_distance", "andorra_distance",
                                  "Constant"),
                         omit = c("division_1833"),
                         omit.stat = c("adj.rsq", "ser", "f"),
                         add.lines = list(c("1833 Provinces FE", "Yes", "Yes", "Yes", "Yes")),
                         header = FALSE)
```

```{r, results = "asis"}
master_cem_data <- data_modified %>%
  filter(trib_head == 0) %>% 
  select(pop_padron, pop1528, latitude, longitude, rugged, dist_trib, dist_road, 
         dist_river, dist_sea, regional_centre_dist_km, historical_mosque_dist_km, capital_dist_km,  
         impacthigh, adj_impact, log_pop, civil_married,
         class_upper, age, autonomia, log_gdppc, religious, c_secondplus, trust2, division_1833, area,
         data_1528, north_of_madrid, tangier_distance_km, andorra_distance_km, density, area)

cem_data <- master_cem_data %>% 
  select(pop_padron, pop1528, latitude, longitude, rugged, dist_river, dist_sea,
         impacthigh, regional_centre_dist_km, historical_mosque_dist_km, capital_dist_km,
         tangier_distance_km, andorra_distance_km)

cem_data <- data.frame(cem_data)
master_cem_data <- data.frame(master_cem_data)

table_two_cem <- cem(treatment = "impacthigh", data = cem_data) 

table_two_a_cem <- lm(log_gdppc ~ adj_impact + log_pop + rugged + area + density + data_1528 +
             north_of_madrid + regional_centre_dist_km + historical_mosque_dist_km + capital_dist_km + 
             tangier_distance_km + andorra_distance_km + dist_trib + dist_river + 
             dist_road + dist_sea + civil_married + class_upper + age + 
           as.factor(division_1833), data = master_cem_data,
                    weights = table_two_cem$w)

table_two_b_cem <- lm(religious ~ adj_impact + log_pop + rugged + area + density + data_1528 +
             north_of_madrid + regional_centre_dist_km + historical_mosque_dist_km + capital_dist_km + 
             tangier_distance_km + andorra_distance_km + dist_trib + dist_river + 
             dist_road + dist_sea + civil_married + class_upper + age + 
           as.factor(division_1833), data = master_cem_data,
                    weights = table_two_cem$w)

table_two_c_cem <- lm(c_secondplus ~ adj_impact + log_pop + rugged + area + density + data_1528 +
             north_of_madrid + regional_centre_dist_km + historical_mosque_dist_km + capital_dist_km + 
             tangier_distance_km + andorra_distance_km + dist_trib + dist_river + 
             dist_road + dist_sea + civil_married + class_upper + age + 
           as.factor(division_1833), data = master_cem_data,
                    weights = table_two_cem$w) 

table_two_d_cem <- lm(trust2 ~ adj_impact + log_pop + rugged + area + density + data_1528 +
             north_of_madrid + regional_centre_dist_km + historical_mosque_dist_km + capital_dist_km + 
             tangier_distance_km + andorra_distance_km + dist_trib + dist_river + 
             dist_road + dist_sea + civil_married + class_upper + age + 
           as.factor(division_1833), data = master_cem_data,
                    weights = table_two_cem$w)

star_two_cem = stargazer(table_two_a_cem, table_two_b_cem, table_two_c_cem, table_two_d_cem, 
                         title = "Panel B: CEM",
                         keep = ("adj_impact"),
                         dep.var.labels = c("log GDP p.c.", "religiosity", "education", "trust"),
                         covariate.labels = c("inquisitorial intensity"),
                         omit.stat = c("adj.rsq", "ser", "f"),
                         type = "latex",
                         header = FALSE)
```

```{r}
source("new_aes.R")

division_1833 <- data_modified %>% 
  group_by(division_1833) %>%
  summarize(geometry = st_union(geometry))

data_modified %>%
  mutate(regional_centre = ifelse(regional_centre_dist_km == 0, TRUE, FALSE)) %>% 
  ggplot() +
  geom_sf(aes(geometry = geometry, fill = regional_centre_dist_km), 
          lty = .1)  +
  scale_fill_continuous(name = "Distance from Regional Centre (km)") +
  geom_sf(aes(geometry = geometry, alpha = regional_centre),
          fill = "red", 
          lty = .1) +
  geom_sf(data = division_1833, aes(geometry = geometry), alpha = 0, color = "black") +
  annotation_scale(location = "bl", width_hint = 0.5) +
  theme_map() +
  theme(legend.justification = c(1, 0), legend.position = c(1, 0),
        legend.text = element_text(size = 7),
        legend.key.size = unit(.5, 'cm'),
        legend.title = element_text(size = 9)) +
  scale_alpha_discrete(name = "Regional Metropolitan Area", range = c(0, 1), na.value = 0)

data_modified %>%
  mutate(historic_centre = ifelse(historical_mosque_dist_km == 0, TRUE, FALSE)) %>% 
  ggplot() +
  geom_sf(aes(geometry = geometry, fill = historical_mosque_dist_km), 
          lty = .1)  +
  scale_fill_continuous(name = "Distance from Historical Mosque (km)") +
  geom_sf(aes(geometry = geometry, alpha = historic_centre),
          fill = "red", 
          lty = .1) +
  geom_sf(data = division_1833, aes(geometry = geometry), alpha = 0, color = "black") +
  annotation_scale(location = "bl", width_hint = 0.5) +
  theme_map() +
  theme(legend.justification = c(1, 0), legend.position = c(1, 0),
        legend.text = element_text(size = 7),
        legend.key.size = unit(.5, 'cm'),
        legend.title = element_text(size = 9)) +
  scale_alpha_discrete(name = "Historical Mosque Presence", range = c(0, 1), na.value = 0)
```
